ACCELERATED TEMPLATE MATCHING USING LOCAL 
STATISTICS AND FOURIER TRANSFORMS 
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Abstract — This paper presents a method to accelerate correlation-based 
image template matching using local statistics that are computed by 
Fourier transform cross correlation. This approach is applicable to 
several different metrics. The concept is based upon equivalent spatial 
and frequency domain principles. Each metric is computed completely 
in the frequency domain using Discrete Fourier Transforms. Timing 
results are shown to be independent of the size of the smaller template 


image. 


1. INTRODUCTION 


Image registration is an operation that aligns the pixels of one image to the 
corresponding pixels of another image. There are many goals that are typical 
of image registration. Some of these include: detecting changes between 
images (as in vegetation analysis in remote sensing and industrial parts 
quality control), aligning multiple images prior to creating a mosaic (in 
remote sensing) and looking for similar images (for content based image 
retrieval and fingerprint analysis). Numerous approaches have been 


proposed, which include: pixel-based template matching, feature matching, 
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area matching, shape matching, transform analysis matching and heuristics 


matching. Detailed descriptions can be found in numerous books and survey 


papers [1 ]-[7]. 


2. BACKGROUND 


This paper focuses on pixel-based template matching via correlation metrics. 
This is an old and traditional method where a small image is moved one 
pixel at a time over a larger image. For each shift position, a metric is 
computed pixel by pixel between the small image and the correspondingly 
sized region of the larger image. The position where the metric value is 
largest or smallest, depending upon the metric, identifies the shift position 


for which the small image best matches with the large image. 


One of the most common metrics is the normalized cross correlation (NCC), 


which can be expressed in the spatial domain as 


S (Sti) - My)((LG +h, +k) - M,)) 
NCC(h,k) = ——! 75: (1) 
{Suan} Sluvenes—m} 
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Here S(i,j) is the small image, L(1,j) is the large image, Ms is the mean of the 
small image, M; = M_;(h,k) is the mean of the subsection of the large image 
at offset (h,k), N is the number of pixels in the small image and NCC(h,k) is 
the normalized cross correlation metric at offset (h,k). The numerator is 
essentially a simple cross correlation, but using a zero mean small image and 


zero mean subsections of the larger image. The mean subtraction mitigates 


Page 2 


brightness differences between the two images. The denominator is included 
so that the resulting correlation metric ranges from -1 to 1. A perfect match 


has a value of 1. 


Normalized cross correlation, as described by equation (1), is 
computationally intensive and slow. Part of the complexity has to do with 
evaluating the numerator correlation in the spatial domain when the template 
image is large. The other aspect that adds to the complexity is the 
computation of the mean and standard deviation of each subsection of the 


larger image. 


A number of techniques have been used to speed up these computations. A 
simple approach uses a coarse to fine search strategy. The images are 
reduced in size and the correlation metric is evaluated and the best match 
found. Then the matching is repeated at full resolution, but only in the 
neighborhood of the coarse match location [8][9]. A variation on this theme 


involves pyramidal search techniques [10][11]. 


The Bounded Partial Correlation method uses a sufficient condition test at 
each shift position to rapidly skip most of the expensive calculations 
involved in the NCC scores at those points that cannot improve the best 


score found so far [12]. 


Another approach skips the normalization and computes the simple cross 
correlation, C(h,j), using forward and inverse Fourier transforms. A basic 
principle of Fourier transforms is that convolution in the spatial domain is 


equivalent to multiplication in the frequency domain. Likewise, correlation 
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in the spatial domain is equivalent to multiplication in the frequency domain 


using the complex conjugate of one of the transformed images. 


For simple cross correlation, the Fourier transform procedure is as follows. 
First pad the smaller image with zeros at the bottom and right sides to fill it 
out to the size of the larger image. Next, apply the Fourier transform to the 
both the padded small image and the large image. Then, take the complex 
conjugate of one of them and multiply the two together. Finally, take the 
inverse Fourier transform. This process is much faster than doing the un- 
normalized correlation in the spatial domain. This spatial and frequency 


domain equivalents may be expressed as 


Cth, j) = Y SG JXL + hj +k) = FHF OFOD} =S@L, (2) 
iJ 
where F is the Fourier transform, F” is the complex conjugate of the Fourier 
transform, F” is the inverse Fourier transform and S is padded with zeros to 
the same size as the large image. A@B, is defined as a shorthand notation for 
the forward and inverse Fourier transform cross correlation process between 


any two images A and B.’ This nomenclature will be used extensively in the 


subsequent sections. 


If the Fourier transforms of the two images are divided by their magnitudes 
as a form of normalization, then the inverse Fourier transform of the product 


is called phase correlation [13]. The downside here is that it bypasses the 


2 To avoid normalization corrections, it is best that the internal Fourier Transform 
normalization, (1/total pixels) is computed in the inverse Fourier Transform 
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proper normalization. Furthermore, it is based only on phase information 


and is insensitive to changes in the image’s intensity. 


Lewis [14][15] used a mixed spatial and Fourier transform approach to 


compute the NCC. He pointed out that (1) can be expressed as 


S (Si) - Ms (LG +h,j+)- M, X (SG j)- Ms) 
NCC(h,k) = + a = (3) 


{Ssi.n-m,) Ztorm) 


ij ij 





Furthermore, he noted that the second term in the numerator is zero, because 


È S(j) = ÈM; = NMs. This allowed him to compute the numerator with the 


Fourier transform cross correlation as in (2) after subtracting the mean from 
the small image. On the other hand, he computed the large image’s 
denominator term in the spatial domain using summed area tables [16] to 
speed up that part of the computation. The summed area tables were used to 
evaluate the mean and mean squared of each subsection of the large image 


very quickly. 


Lastly, others have used variations the Lewis technique using summed area 
tables to compute the denominator. But they have also computed the 
numerator using summed area tables [17] or with a weighted sum of basis 


functions[ 18]. 


The method described in the following sections computes the NCC and other 


metrics using Fourier transform correlations alone. 
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3. LOCAL STATISTICS 


In (3), the two denominator terms can be separate and by definition are just 
the standard deviation of the small image, Os, and the standard deviation of 
the large image’s subsections, OL = O;(h,k). Therefore, (3) may be 


expressed as 


S S'@A(LG+ h,j +) 


NCC(h,k) = + ; 4 
(h,k) NOO, (4) 





where S'(1,j) = S(j) - Ms. 


In the spatial domain, the local sums of L at each offset position of the small 
image relative to the large image can be computed as a correlation of the 
large image with a rectangular kernel, U, the size of the small image having 
unit weights at each element. The mean values, Mı are then achieved by 
dividing the sums by N. The important factor here is that the local mean 
image is achieved from just a correlation with a uniform kernel of unit value 


weights. This can be expressed as 


M, (h,k) = PY UG, DL +h, j+ k). (5) 


i,j 


As mentioned earlier, correlation in the spatial domain is equivalent to 
multiplication in the frequency domain (with one Fourier transform term 


conjugated). The corresponding frequency domain image, U, is then just an 
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image the size of the small image with all pixel values equal to unity, but 
padded with zeroes at the bottom and right sides to the size of the large 
image. This is in analogy to the padding process used when cross correlating 
the small and large images using Fourier transforms as described earlier. 


Therefore, the Fourier transform analogy to (5) is just 
M,(h,k) = M, = (lo @L). (6) 


For the standard deviation, Or, one may recast it in variance form as 


O,(h,k) =O, = Pgri- Egne] | (7) 





By definition, the standard deviation of x is just the square root of the 
variance of x, which is equal to the mean of the square of x minus the square 
of the mean of x. Therefore, the standard deviation of each subsection of L 
can be expressed, using the shorthand notation for the Fourier transform 


correlation process, as 





_[wer) uer] 


O,(h,k) =O, N N? 


(8) 


4. CORRELATION METRICS 


In this section, several different correlation metrics will be expressed using 


the Fourier transform correlation method. 


4.1 Normalized Cross Correlation 
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Equations (6) and (8) may be substituted into equation (4) to give the final 
Fourier transform format of the Normalized Cross Correlation for a 


grayscale image. 


SOL 
o {NUEL -USL 





NCC(h,k) = (9) 


Equation (9) shows that the normalized cross correlation can be evaluated 
using only 3 simple correlations via Fourier transforms. For color images, 
either the images are converted to grayscale first or the correlation is 
performed on each color channel and the results combined. This author first 
used this approach’ in a template matching study of pairs of images where 
one image was a photograph and the other was a synthetic thermal image. 
Test condition variations included different thermal wavelengths, lighting 


conditions and noise levels [19]. 


Sun, et. al. [20] and later Papamakarios [21] used a similar local statistics 
method to perform normalized cross correlation solely using Fourier 
transforms. However, they reduced the computational complexity to what 
they called 2.5 FFTs. Their approach combined L and L? into one complex 
expression, (L+iL’), before applying a forward Fourier transform, F(L+iL’), 
and then recovered the separate correlations from 


F)(U*L) + iF '(U*L). 


4.2 Root Mean Squared Error 





3 Technique only reported verbally at the conference presentation 
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For a grayscale image, the root mean squared error metric may be expressed 


as 


RMSE = [RBG -L(i+h,j+ 0) (10) 


tJ 


or with expansion as 


Ce eae eee eee oe 
ause -1231s (i) -286 pUGrh +b) isho] : (11) 


ij 


Since the small image is independent of (h,k), its squared sum is a constant. 
So, one may fill out a new small sized image, T, with uniform values of this 
sum and then pad it with zeroes to the size of the larger image. Then, (11) 


may be converted to Fourier transform correlation form as 
1 0.5 
RMSE -l (r-xs 3DU) , (12) 


For a color image, the argument inside the radical would be evaluated for 
each channel, added together and divided by the number of channels. The 


RMSE metric is unbounded and a perfect match has a score of 0. 


4.3 Dot Product Correlation 
When the two images are dissimilar in sensors or lighting, such as the case 
in [19], it becomes advantageous to use the intensity values of edge- 


extracted images rather than the raw image values. Either of the above two 
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metrics may be used in this case. However, if edge directions are also 
extracted, then a dot product like metric may provide better results. Such a 


metric may be expressed as 


ppc = Y Sid i+ hj +8) + SED i+ hj + 8 
7 NS (injLy (i+ hj +4) 





(13) 


where subscripts X,Y, M correspond to the x gradient direction image, the Y 
gradient direction image and the gradient magnitude image. The latter is 
simply the square root of the sum of squares of the two gradient direction 
images. Each gradient derivative component may be divided by its 
respective magnitude. This will be indicated below with an apostrophe. If 
the resulting small images are padded with zeros to the size of the large 


image, then (13) may be converted to Fourier transform correlation form as 
1 
DPC =~ (S; © Ly +5, OL). (14) 


A variation of this dot product correlation metric was also used in [19]. The 
DPC metric has a range of values between -1 and 1 and a perfect match has 


a value of 1. 


5. RESULTS 

Equations (9), (12) and (14) were each implement as Unix bash shell scripts 
using the open source, cross-platform, image processing suite called 
Imagemagick [22]. It utilizes the open source FFTW [23] package to 


perform Fourier transforms. The Imagemagick suite includes brute-force 
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spatial domain normalized cross correlation and root mean squared error 
metrics for template matching. A 2010 vintage 2.66 GHz Intel Core 2 Duo 
Mac Mini was used for testing. Imagemagick, which can be configured for 
multi-threaded operation via OpenMP, was limited to one thread for most of 


these tests. 


Figures 1 show the metric surfaces computed with NCC, RMSE and DPC 
correlation methods, respectively. The large image had dimensions of 
256x256 and the small image was a 128x128 subsection located at 
coordinates (64,52). Each metric successfully found the correct match 
location. The NCC score was 1.00, the RMSE score was 0.00 and the DPC 
score was 1.00 to two decimal places. No tests were performed in this study 
for robustness against noise, image distortions or different images. For the 
RMSE surface, the image in Figure 1 has been inverted to show bright 
values for the best match. The Sobel edge detector was used to create the X 
and Y edge directional derivatives needed for the DPC. The DPC metric 
surface shows only a very small white dot at the correct locations, whereas 


the surfaces for the other methods have a wider peak. 
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Figure 1 (a) 256x256 large image, (b) 128x128 small image, (c) NCC 


metric surface, (d) RMSE metric surface and (e) DPC metric image. 


Tests, also, were performed to compare the run-times of both the brute force 
(1) and the Fourier transform (9) approaches for the NCC metric. 
Comparisons were made for a 500x500 color large image and various square 
sizes for the small image ranging from 10x10 to 450x450. The resulting 
CPU times are shown in Table 1 along with their ratios, which characterize 
the speed-up factor associated with the frequency domain method compared 
to the spatial domain method. The last columns shows the estimated ratio 
between the brute force method and Lewis’s method, where the denominator 
is computed with summed area tables and the numerator is computed with 
the Fourier transform evaluation of the cross correlation. The estimates are 
based upon the computational burden analysis presented in Lewis’s papers 


and summarized in Table 2. The estimates are based solely on counts of 
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additions, subtractions and multiplications, each weighted equally. The 
estimates do not take into account memory and other limiting factors. The 
computation of the Fourier transform cross correlation is also subject to type 
of radix approach used in actual implementation. Consequently, these 
estimated speed-up factors should be considered with liberal uncertainty. 
Nevertheless, Table 1 shows that substantial performance improvements can 
be achieved by Lewis’s method. However, the method proposed in this study 
using Fourier transforms to evaluate both the numerator and denominator 
would appear to be even faster and independent of the size of the smaller 
image. Table 3 shows the same data, but for dual threading. Table 4 show 


the same kind of data, but RMSE matching. 


Table 1. Single threaded comparison of run times between spatial and 


Fourier domain normalized cross correlation approaches. 


Large Small NCC NCC Ratio Estimated 
Image Image Spatial Frequency Spatial vs Ratio 
Size Size Domain Domain Frequency Spatial 
Run Times Run Times Domains Domain vs 
seconds seconds Timings Lewis 
Method 
500x500 10x10 22.1 1.6 14 1 
500x500 25x25 39.4 1.6 25 5 
500x500 50x50 91.7 1.5 62 19 
500x500 100x100 247.2 1.4 174 59 
500x500 150x150 411.4 1.4 286 102 
500x500 200x200 536.1 1.5 353 133 
500x500 250x250 593.4 1.5 406 145 
500x500 300x300 554.8 1.5 375 134 
500x500 350x350 427.0 1.5 285 103 
500x500 400x400 262.4 1.5 172 60 
500x500 450x450 84.3 1.6 54 19 
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Table 2. Estimated operation counts between brute force spatial domain and 


Lewis’ method. This is the basis of the numbers in the last column of Table 


l. 


D=large image NCC Estimated Operations NCC Estimated Operations 


square 
dimension 


and 


d=small image 
square 
dimension 


numerator 


denominator 


Spatial Domain 


2*(d*d)*(D-d+1)*(D-d+1) 


3*(d*d)*(D-d+1)*(D-d+1) 


Lewis Method 


30*D*D*log2(D*D) 


4*D*D + 8*(D-d+1)*(D-d+1) 


Table 3. Double threaded comparison of run times between spatial and 


Fourier domain normalized cross correlation approaches. Only a slight gain 


in speed seems to be had between single and double threading. 


Large 
Image 
Size 


500x500 
500x500 
500x500 
500x500 
500x500 
500x500 
500x500 
500x500 
500x500 
500x500 
500x500 


Small 
Image 
Size 


10x10 

25x25 

50x50 
100x100 
150x150 
200x200 
250x250 
300x300 
350x350 
400x400 
450x450 


NCC 
Spatial 
Domain 
Run Times 
seconds 


15.2 
23.4 
51.4 
130.3 
215.4 
289.8 
321.7 
289.2 
289.2 
150.3 
51.7 


NCC 
Frequency 
Domain 
Run Times 
seconds 


1.3 
1.3 
1.3 
1.3 
1.3 
1.4 
1.4 
1.4 
1.4 
1.6 
1.5 
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Ratio 
Spatial vs 
Frequency 

Domains 


11 
18 
39 
98 
161 
215 
235 
210 
207 
93 
36 


Table 4. Single threaded comparison of run times between spatial and 


Fourier domain root mean squared error correlation approaches. 


Large Small RMSE RMSE Ratio 
Image Image Spatial Frequency Spatial vs 
Size Size Domain Domain Frequency 
Run Times Run Times Domains 
seconds seconds 
500x500 10x10 17.6 1.2 15 
500x500 25x25 20.4 1.1 18 
500x500 50x50 27.6 1.1 25 
500x500 100x100 48.8 1.1 44 
500x500 150x150 70.5 1.1 62 
500x500 200x200 88.9 1.1 79 
500x500 250x250 108.4 1.1 95 
500x500 300x300 106.3 1.2 92 
500x500 350x350 82.8 1.2 71 
500x500 400x400 60.6 1.2 50 
500x500 450x450 19.1 1.2 16 


6. CONCLUSION 

This paper has presented a method of performing several types of 
correlation-based template matching, where all major computations are done 
in the Fourier Domain. This approach has proved both efficient and flexible. 
Run times are one to two orders of magnitude faster than doing the same 


types of correlations in the spatial domain. 
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